Goto

Collaborating Authors

 Pernambuco



The Brazilian Director Who's Up for Multiple Oscars

The New Yorker

Kleber Mendonça Filho wants his films to reclaim lost history. For Kleber Mendonça Filho, filmmaking is an act of both provocation and preservation. Mendonça was born in 1968, in the early years of a ruthless military dictatorship--a time when cinema, like much else, was harshly constrained. His mother, Joselice Jucá, was a historian who studied Brazil's abolitionist movement, and she taught him that filling gaps in the cultural memory was a way to expose concealed truths. His relationship with film is inextricably linked with his home town, Recife--a port city where attractive beaches and high-rise developments coexist with sprawling favelas and rampant crime. In his youth, Mendonça was fascinated by the city's grand cinema palaces. He carried a Super 8 camera to the tops of marquees and shot dizzying images; he spent hours in projection booths, learning the mechanics of how films reached the screen. Over time, Mendonça watched those theatres fall into decline, an experience that he likened to being aboard a ship as it wrecked. But even as Recife lost its allure, he made the city a fixture of his films--a way of vindicating its place in history. His first narrative feature, "Neighboring Sounds," takes place on a street where he lived as a child, a setting that he spent years documenting. Later, he made "Pictures of Ghosts," a documentary about Recife told largely through its cinemas.


CodeFlowLM: Incremental Just-In-Time Defect Prediction with Pretrained Language Models and Exploratory Insights into Defect Localization

Monteiro, Monique Louise, Cabral, George G., OLiveira, Adriano L. I.

arXiv.org Artificial Intelligence

CodeT5+: CodeT5+ was initially chosen as one of the baselines because it was among the top-performing models in our experiments on defect prediction (Monteiro et al., 2025). Although CodeT5+ does not contain an explicit [CLS] token, as in BERT-based language models, we still use the first encoded token as the head of the classification layer. Therefore, we maintain the default practice of inspecting the weights of the first token attention heads. UniXCoder: In the same way as in CodeT5+, UniXCoder was also among the top performers in defect prediction experiments (Monteiro et al., 2025), so we keep the same default strategy of using the first encoded token attention weights. We also initially considered JIT-Block (Huang et al., 2024) and JIT-CF (Ju et al., 2025). Regarding JIT-Block, its authors reconstructed the dataset (JIT-Defects4J) into the changed block format, which preserves the relative positional information between added and deleted code lines -- information lost in traditional datasets -- thus facilitating the model's ability to learn the semantic meaning of code changes. So, as the dataset was changed, it would not be possible to conduct a fair comparison. Finally, according to its published results, JIT-CF does not achieve better results than JIT-Smart. A consolidated overview of the baseline classifiers is presented in Table 2. 3.4 Description of the Experiments RQ1 How do pre-trained language models perform in comparison to traditional machine learning approaches for continual within-project and cross-project Just-in-Time Software Defect Prediction (JIT-SDP)?


Adaptive Detection of Software Aging under Workload Shift

Silva, Rafael Jose Moura, Nascimento, Maria Gizele, Machida, Fumio, Andrade, Ermeson

arXiv.org Artificial Intelligence

Software aging is a phenomenon that affects long-running systems, leading to progressive performance degradation and increasing the risk of failures. T o mitigate this problem, this work proposes an adaptive approach based on machine learning for software aging detection in environments subject to dynamic workload conditions. W e evaluate and compare a static model with adaptive models that incorporate adaptive detectors, specifically the Drift Detection Method (DDM) and Adaptive Windowing (ADWIN), originally developed for concept drift scenarios and applied in this work to handle workload shifts. Experiments with simulated sudden, gradual, and recurring workload transitions show that static models suffer a notable performance drop when applied to unseen workload profiles, whereas the adaptive model with ADWIN maintains high accuracy, achieving an F1-Score above 0.93 in all analyzed scenarios.


Improving LLM-based Ontology Matching with fine-tuning on synthetic data

Sousa, Guilherme, Lima, Rinaldo, Trojahn, Cassia

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly being integrated into various components of Ontology Matching pipelines. This paper investigates the capability of LLMs to perform ontology matching directly on ontology modules and generate the corresponding alignments. Furthermore, it is explored how a dedicated fine-tuning strategy can enhance the model's matching performance in a zero-shot setting. The proposed method incorporates a search space reduction technique to select relevant subsets from both source and target ontologies, which are then used to automatically construct prompts. Recognizing the scarcity of reference alignments for training, a novel LLM-based approach is introduced for generating a synthetic dataset. This process creates a corpus of ontology submodule pairs and their corresponding reference alignments, specifically designed to fine-tune an LLM for the ontology matching task. The proposed approach was evaluated on the Conference, Geolink, Enslaved, Taxon, and Hydrography datasets from the OAEI complex track. The results demonstrate that the LLM fine-tuned on the synthetically generated data exhibits superior performance compared to the non-fine-tuned base model. The key contribution is a strategy that combines automatic dataset generation with fine-tuning to effectively adapt LLMs for ontology matching tasks.


The use of vocal biomarkers in the detection of Parkinson's disease: a robust statistical performance comparison of classic machine learning models

Sacramento, Katia Pires Nascimento do, Garcia, Elliot Q. C., Vilela, Nicéias Silva, Sacramento, Vinicius P., Ferreira, Tiago A. E.

arXiv.org Artificial Intelligence

Parkinson's disease (PD) is a progressive neurodegenerative disorder that, in addition to directly impairing functional mobility, is frequently associated with vocal impairments such as hypophonia and dysarthria, which typically manifest in the early stages. The use of vocal biomarkers to support the early diagnosis of PD presents a non-invasive, low-cost, and accessible alternative in clinical settings. Thus, the objective of this cross-sectional study was to consistently evaluate the effectiveness of a Deep Neural Network (DNN) in distinguishing individuals with Parkinson's disease from healthy controls, in comparison with traditional Machine Learning (ML) methods, using vocal biomarkers. Two publicly available voice datasets were used. Mel-frequency cepstral coefficients (MFCCs) were extracted from the samples, and model robustness was assessed using a validation strategy with 1000 independent random executions. Performance was evaluated using classification statistics. Since normality assumptions were not satisfied, non-parametric tests (Kruskal-Wallis and Bonferroni post-hoc tests) were applied to verify whether the tested classification models were similar or different in the classification of PD. With an average accuracy of $98.65\%$ and $92.11\%$ on the Italian Voice dataset and Parkinson's Telemonitoring dataset, respectively, the DNN demonstrated superior performance and efficiency compared to traditional ML models, while also achieving competitive results when benchmarked against relevant studies. Overall, this study confirms the efficiency of DNNs and emphasizes their potential to provide greater accuracy and reliability for the early detection of neurodegenerative diseases using voice-based biomarkers.



Single Tensor Cell Segmentation using Scalar Field Representations

Vargas, Kevin I. Ruiz, Galdino, Gabriel G., Ren, Tsang Ing, Cunha, Alexandre L.

arXiv.org Artificial Intelligence

We investigate image segmentation of cells under the lens of scalar fields. Our goal is to learn a continuous scalar field on image domains such that its segmentation produces robust instances for cells present in images. This field is a function parameterized by the trained network, and its segmentation is realized by the watershed method. The fields we experiment with are solutions to the Poisson partial differential equation and a diffusion mimicking the steady-state solution of the heat equation. These solutions are obtained by minimizing just the field residuals, no regularization is needed, providing a robust regression capable of diminishing the adverse impacts of outliers in the training data and allowing for sharp cell boundaries. A single tensor is all that is needed to train a \unet\ thus simplifying implementation, lowering training and inference times, hence reducing energy consumption, and requiring a small memory footprint, all attractive features in edge computing. We present competitive results on public datasets from the literature and show that our novel, simple yet geometrically insightful approach can achieve excellent cell segmentation results.


DermAI: Clinical dermatology acquisition through quality-driven image collection for AI classification in mobile

Bezerra, Thales, Thyago, Emanoel, Cunha, Kelvin, Abreu, Rodrigo, Papais, Fábio, Mauro, Francisco, Lopes, Natália, Medeiros, Érico, Guido, Jéssica, Cruz, Shirley, Borba, Paulo, Ren, Tsang Ing

arXiv.org Artificial Intelligence

AI-based dermatology adoption remains limited by biased datasets, variable image quality, and limited validation. We introduce DermAI, a lightweight, smartphone-based application that enables real-time capture, annotation, and classification of skin lesions during routine consultations. Unlike prior dermoscopy-focused tools, DermAI performs on-device quality checks, and local model adaptation. The DermAI clinical dataset, encompasses a wide range of skin tones, ethinicity and source devices. In preliminary experiments, models trained on public datasets failed to generalize to our samples, while fine-tuning with local data improved performance. These results highlight the importance of standardized, diverse data collection aligned with healthcare needs and oriented to machine learning development.